Delete by query API

您所在的位置:网站首页 delete entry Delete by query API

Delete by query API

2024-06-26 23:48:58| 来源: 网络整理| 查看: 265

Examplesedit

Delete all documents from the my-index-000001 data stream or index:

resp = client.delete_by_query( index="my-index-000001", conflicts="proceed", body={"query": {"match_all": {}}}, ) print(resp) response = client.delete_by_query( index: 'my-index-000001', conflicts: 'proceed', body: { query: { match_all: {} } } ) puts response POST my-index-000001/_delete_by_query?conflicts=proceed { "query": { "match_all": {} } }

Delete documents from multiple data streams or indices:

resp = client.delete_by_query( index=["my-index-000001", "my-index-000002"], body={"query": {"match_all": {}}}, ) print(resp) response = client.delete_by_query( index: 'my-index-000001,my-index-000002', body: { query: { match_all: {} } } ) puts response POST /my-index-000001,my-index-000002/_delete_by_query { "query": { "match_all": {} } }

Limit the delete by query operation to shards that a particular routing value:

resp = client.delete_by_query( index="my-index-000001", routing="1", body={"query": {"range": {"age": {"gte": 10}}}}, ) print(resp) response = client.delete_by_query( index: 'my-index-000001', routing: 1, body: { query: { range: { age: { gte: 10 } } } } ) puts response POST my-index-000001/_delete_by_query?routing=1 { "query": { "range" : { "age" : { "gte" : 10 } } } }

By default _delete_by_query uses scroll batches of 1000. You can change the batch size with the scroll_size URL parameter:

resp = client.delete_by_query( index="my-index-000001", scroll_size="5000", body={"query": {"term": {"user.id": "kimchy"}}}, ) print(resp) response = client.delete_by_query( index: 'my-index-000001', scroll_size: 5000, body: { query: { term: { 'user.id' => 'kimchy' } } } ) puts response POST my-index-000001/_delete_by_query?scroll_size=5000 { "query": { "term": { "user.id": "kimchy" } } }

Delete a document using a unique attribute:

resp = client.delete_by_query( index="my-index-000001", body={"query": {"term": {"user.id": "kimchy"}}, "max_docs": 1}, ) print(resp) response = client.delete_by_query( index: 'my-index-000001', body: { query: { term: { 'user.id' => 'kimchy' } }, max_docs: 1 } ) puts response POST my-index-000001/_delete_by_query { "query": { "term": { "user.id": "kimchy" } }, "max_docs": 1 } Slice manuallyedit

Slice a delete by query manually by providing a slice id and total number of slices:

resp = client.delete_by_query( index="my-index-000001", body={ "slice": {"id": 0, "max": 2}, "query": {"range": {"http.response.bytes": {"lt": 2000000}}}, }, ) print(resp) resp = client.delete_by_query( index="my-index-000001", body={ "slice": {"id": 1, "max": 2}, "query": {"range": {"http.response.bytes": {"lt": 2000000}}}, }, ) print(resp) response = client.delete_by_query( index: 'my-index-000001', body: { slice: { id: 0, max: 2 }, query: { range: { 'http.response.bytes' => { lt: 2_000_000 } } } } ) puts response response = client.delete_by_query( index: 'my-index-000001', body: { slice: { id: 1, max: 2 }, query: { range: { 'http.response.bytes' => { lt: 2_000_000 } } } } ) puts response POST my-index-000001/_delete_by_query { "slice": { "id": 0, "max": 2 }, "query": { "range": { "http.response.bytes": { "lt": 2000000 } } } } POST my-index-000001/_delete_by_query { "slice": { "id": 1, "max": 2 }, "query": { "range": { "http.response.bytes": { "lt": 2000000 } } } }

Which you can verify works with:

resp = client.indices.refresh() print(resp) resp = client.search( index="my-index-000001", size="0", filter_path="hits.total", body={"query": {"range": {"http.response.bytes": {"lt": 2000000}}}}, ) print(resp) response = client.indices.refresh puts response response = client.search( index: 'my-index-000001', size: 0, filter_path: 'hits.total', body: { query: { range: { 'http.response.bytes' => { lt: 2_000_000 } } } } ) puts response GET _refresh POST my-index-000001/_search?size=0&filter_path=hits.total { "query": { "range": { "http.response.bytes": { "lt": 2000000 } } } }

Which results in a sensible total like this one:

{ "hits": { "total" : { "value": 0, "relation": "eq" } } } Use automatic slicingedit

You can also let delete-by-query automatically parallelize using sliced scroll to slice on _id. Use slices to specify the number of slices to use:

resp = client.delete_by_query( index="my-index-000001", refresh=True, slices="5", body={"query": {"range": {"http.response.bytes": {"lt": 2000000}}}}, ) print(resp) response = client.delete_by_query( index: 'my-index-000001', refresh: true, slices: 5, body: { query: { range: { 'http.response.bytes' => { lt: 2_000_000 } } } } ) puts response POST my-index-000001/_delete_by_query?refresh&slices=5 { "query": { "range": { "http.response.bytes": { "lt": 2000000 } } } }

Which you also can verify works with:

resp = client.search( index="my-index-000001", size="0", filter_path="hits.total", body={"query": {"range": {"http.response.bytes": {"lt": 2000000}}}}, ) print(resp) response = client.search( index: 'my-index-000001', size: 0, filter_path: 'hits.total', body: { query: { range: { 'http.response.bytes' => { lt: 2_000_000 } } } } ) puts response POST my-index-000001/_search?size=0&filter_path=hits.total { "query": { "range": { "http.response.bytes": { "lt": 2000000 } } } }

Which results in a sensible total like this one:

{ "hits": { "total" : { "value": 0, "relation": "eq" } } }

Setting slices to auto will let Elasticsearch choose the number of slices to use. This setting will use one slice per shard, up to a certain limit. If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards.

Adding slices to _delete_by_query just automates the manual process used in the section above, creating sub-requests which means it has some quirks:

You can see these requests in the Tasks APIs. These sub-requests are "child" tasks of the task for the request with slices. Fetching the status of the task for the request with slices only contains the status of completed slices. These sub-requests are individually addressable for things like cancellation and rethrottling. Rethrottling the request with slices will rethrottle the unfinished sub-request proportionally. Canceling the request with slices will cancel each sub-request. Due to the nature of slices each sub-request won’t get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution. Parameters like requests_per_second and max_docs on a request with slices are distributed proportionally to each sub-request. Combine that with the point above about distribution being uneven and you should conclude that using max_docs with slices might not result in exactly max_docs documents being deleted. Each sub-request gets a slightly different snapshot of the source data stream or index though these are all taken at approximately the same time. Change throttling for a requestedit

The value of requests_per_second can be changed on a running delete by query using the _rethrottle API. Rethrottling that speeds up the query takes effect immediately but rethrotting that slows down the query takes effect after completing the current batch to prevent scroll timeouts.

$params = [ 'task_id' => 'r1A2WoRbTwKZ516z6NEs5A:36619', ]; $response = $client->deleteByQueryRethrottle($params); resp = client.delete_by_query_rethrottle( task_id="r1A2WoRbTwKZ516z6NEs5A:36619", requests_per_second="-1", ) print(resp) response = client.delete_by_query_rethrottle( task_id: 'r1A2WoRbTwKZ516z6NEs5A:36619', requests_per_second: -1 ) puts response res, err := es.DeleteByQueryRethrottle( "r1A2WoRbTwKZ516z6NEs5A:36619", esapi.IntPtr(-1), ) fmt.Println(res, err) const response = await client.deleteByQueryRethrottle({ task_id: 'r1A2WoRbTwKZ516z6NEs5A:36619', requests_per_second: '-1' }) console.log(response) POST _delete_by_query/r1A2WoRbTwKZ516z6NEs5A:36619/_rethrottle?requests_per_second=-1

Use the tasks API to get the task ID. Set requests_per_second to any positive decimal value or -1 to disable throttling.

Get the status of a delete by query operationedit

Use the tasks API to get the status of a delete by query operation:

$response = $client->tasks()->list(); resp = client.tasks.list( detailed="true", actions="*/delete/byquery", ) print(resp) response = client.tasks.list( detailed: true, actions: '*/delete/byquery' ) puts response res, err := es.Tasks.List( es.Tasks.List.WithActions("*/delete/byquery"), es.Tasks.List.WithDetailed(true), ) fmt.Println(res, err) const response = await client.tasks.list({ detailed: 'true', actions: '*/delete/byquery' }) console.log(response) GET _tasks?detailed=true&actions=*/delete/byquery

The response looks like:

{ "nodes" : { "r1A2WoRbTwKZ516z6NEs5A" : { "name" : "r1A2WoR", "transport_address" : "127.0.0.1:9300", "host" : "127.0.0.1", "ip" : "127.0.0.1:9300", "attributes" : { "testattr" : "test", "portsfile" : "true" }, "tasks" : { "r1A2WoRbTwKZ516z6NEs5A:36619" : { "node" : "r1A2WoRbTwKZ516z6NEs5A", "id" : 36619, "type" : "transport", "action" : "indices:data/write/delete/byquery", "status" : { "total" : 6154, "updated" : 0, "created" : 0, "deleted" : 3500, "batches" : 36, "version_conflicts" : 0, "noops" : 0, "retries": 0, "throttled_millis": 0 }, "description" : "" } } } } }

This object contains the actual status. It is just like the response JSON with the important addition of the total field. total is the total number of operations that the reindex expects to perform. You can estimate the progress by adding the updated, created, and deleted fields. The request will finish when their sum is equal to the total field.

With the task id you can look up the task directly:

$params = [ 'task_id' => 'r1A2WoRbTwKZ516z6NEs5A:36619', ]; $response = $client->tasks()->get($params); resp = client.tasks.get( task_id="r1A2WoRbTwKZ516z6NEs5A:36619", ) print(resp) response = client.tasks.get( task_id: 'r1A2WoRbTwKZ516z6NEs5A:36619' ) puts response res, err := es.Tasks.Get( "r1A2WoRbTwKZ516z6NEs5A:36619", ) fmt.Println(res, err) const response = await client.tasks.get({ task_id: 'r1A2WoRbTwKZ516z6NEs5A:36619' }) console.log(response) GET /_tasks/r1A2WoRbTwKZ516z6NEs5A:36619

The advantage of this API is that it integrates with wait_for_completion=false to transparently return the status of completed tasks. If the task is completed and wait_for_completion=false was set on it then it’ll come back with results or an error field. The cost of this feature is the document that wait_for_completion=false creates at .tasks/task/${taskId}. It is up to you to delete that document.

Cancel a delete by query operationedit

Any delete by query can be canceled using the task cancel API:

$params = [ 'task_id' => 'r1A2WoRbTwKZ516z6NEs5A:36619', ]; $response = $client->tasks()->cancel($params); resp = client.tasks.cancel( task_id="r1A2WoRbTwKZ516z6NEs5A:36619", ) print(resp) response = client.tasks.cancel( task_id: 'r1A2WoRbTwKZ516z6NEs5A:36619' ) puts response res, err := es.Tasks.Cancel( es.Tasks.Cancel.WithTaskID("r1A2WoRbTwKZ516z6NEs5A:36619"), ) fmt.Println(res, err) const response = await client.tasks.cancel({ task_id: 'r1A2WoRbTwKZ516z6NEs5A:36619' }) console.log(response) POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel

The task ID can be found using the tasks API.

Cancellation should happen quickly but might take a few seconds. The task status API above will continue to list the delete by query task until this task checks that it has been cancelled and terminates itself.



【本文地址】

公司简介

联系我们

今日新闻


点击排行

实验室常用的仪器、试剂和
说到实验室常用到的东西,主要就分为仪器、试剂和耗
不用再找了,全球10大实验
01、赛默飞世尔科技(热电)Thermo Fisher Scientif
三代水柜的量产巅峰T-72坦
作者:寞寒最近,西边闹腾挺大,本来小寞以为忙完这
通风柜跟实验室通风系统有
说到通风柜跟实验室通风,不少人都纠结二者到底是不
集消毒杀菌、烘干收纳为一
厨房是家里细菌较多的地方,潮湿的环境、没有完全密
实验室设备之全钢实验台如
全钢实验台是实验室家具中较为重要的家具之一,很多

推荐新闻


图片新闻

实验室药品柜的特性有哪些
实验室药品柜是实验室家具的重要组成部分之一,主要
小学科学实验中有哪些教学
计算机 计算器 一般 打孔器 打气筒 仪器车 显微镜
实验室各种仪器原理动图讲
1.紫外分光光谱UV分析原理:吸收紫外光能量,引起分
高中化学常见仪器及实验装
1、可加热仪器:2、计量仪器:(1)仪器A的名称:量
微生物操作主要设备和器具
今天盘点一下微生物操作主要设备和器具,别嫌我啰嗦
浅谈通风柜使用基本常识
 众所周知,通风柜功能中最主要的就是排气功能。在

专题文章

    CopyRight 2018-2019 实验室设备网 版权所有 win10的实时保护怎么永久关闭